Microarchitectural Miss/Execute Decoupling

نویسندگان

  • Amir Roth
  • Craig B. Zilles
  • Gurindar S. Sohi
چکیده

The decoupled access/execute architecture described a machine that enables the access of memory values to be decoupled from the consumption of those values. Although never widely adopted in its original form, the decoupled design is a compelling way to tolerate memory latency. In this paper, we propose and demonstrate a novel implementation of decoupling, one based on the following two refinements of the original idea. First, because the latency of cache hits can generally be tolerated, we only decouple from the main program accesses that are likely to miss in the cache. Second, our decoupling takes place at the microarchitectural level, not the architectural level. By treating the access stream as a speculative thread and not allowing it to modify the architectural state of the machine, we relax the correctness constraints that were placed on it in the original design. For many programs, this added flexibility enables a level of decoupling and, consequently, latency tolerance that could not be achieved under the more constrained architectural model.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Synergy of Multithreading and Access/Execute Decoupling

This work presents and evaluates a novel processor microarchitecture which combines two paradigms: access/ execute decoupling and simultaneous multithreading. We investigate how both techniques complement each other: while decoupling features an excellent memory latency hiding efficiency, multithreading supplies the in-order issue stage with enough ILP to hide the functional unit latencies. Its...

متن کامل

Improving Latency Tolerance of Multithreading through Decoupling

ÐThe increasing hardware complexity of dynamically scheduled superscalar processors may compromise the scalability of this organization to make an efficient use of future increases in transistor budget. SMT processors, designed over a superscalar core, are therefore directly concerned by this problem. This work presents and evaluates a novel processor microarchitecture which combines two paradi...

متن کامل

Multipldpass Pipelining: Enhancing In-order Microarchitectures to Out-of-order Performance

Out-of-program-order execution has become almost a ubiquitous characteristic of modern processors because of its ability to tolerate variable memory-instruction latency. As designs are becoming increasingly power-conscious, the cost and complexity of the components of out-of-order execution are becoming problematic. Compilers have generally proven adept at planning useful static instruction-lev...

متن کامل

Multithreaded Decoupled Access/Execute Processors

This work presents and evaluates a novel processor microarchitecture which combines two paradigms: access/execute decoupling and simultaneous multithreading. We investigate how both techniques complement each other in the design of high performance next generation ILP processors. While decoupling features an excellent memory latency hiding efficiency, simultaneous multithreading supplies the in...

متن کامل

Performance and power effectiveness in embedded processors customizable partitioned caches

This paper explores an application-specific customization technique for the data cache, one of the foremost area/power consuming and performance determining microarchitectural features of modern embedded processors. The automated methodology for customizing the processor microarchitecture that we propose results in increased performance, reduced power consumption and improved determinism of cri...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000